Search CORE

27 research outputs found

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

Author: Andreas Stolcke
Dilek Hakkani-Tür
Elizabeth Shriberg
Grosz B.
Gökhan Tür
Hearst Marti A
Passonneau Rebecca J
Publication venue
Publication date: 01/01/2000
Field of study

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Prosody-Based Automatic Segmentation of Speech into Sentences and Topics

Author: Andreas Stolcke
Bahl
Baum
Breiman
Brown
Bruce
Buntine
Dermatas
Dilek Hakkani-Tür
Elizabeth Shriberg
Gökhan Tür
Hearst
Katz
Palmer
Shriberg
Sluijter
Swerts
Swerts
Swerts
Thorsen
Viterbi
Publication venue
Publication date: 01/01/2000
Field of study

A crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models -- for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2), Special Issue on Accessing Information in Spoken Audio, September 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

A Statistical Information Extraction System for Turkish

Author: Gökhan Tür
Publication venue
Publication date: 01/01/2000
Field of study

This thesis presents the results of a study on information extraction from unrestricted Turkish text using statistical language processing methods. We have successfully applied statistical methods using both the lexical and morphological information to the following tasks: The Turkish Text Deasciifier task aims to convert the ASCII characters in a Turkish text, into the corresponding non-ASCII Turkish characters (i.e., "fi", ";5", "g", "", "", '5", and their upper cases)

CiteSeerX

Bilkent University Institutional Repository

A statistical information extraction system for Turkish

Author: DILEK HAKKANI-TÜR
GÖKHAN TÜR
KEMAL OFLAZER
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

Implementing Voting Constraints with Finite State Transducers

Author: Gökhan Tür
Kemal Oflazer
Publication venue
Publication date: 01/01/1998
Field of study

We describe a constraint-based morphological disambiguation system in which individual constraint rules vote on matching morphological parses followed by its implementation using finite state transducers. Voting constraint rules have a number of desirable properties: The outcome of the disambiguation is independent of the order of application of the local contextual constraint rules. Thus the rule developer is relieved from worrying about conflicting rule sequencing. The approach can also combine statistically and manually obtained constraints, and incorporate negative constraints that rule out certain patterns. The transducer implementation has a number of desirable properties compared to other finite state tagging and light parsing approaches, implemented with automata intersection. The most important of these is that since constraints do not remove parses there is no risk of an overzealous constraint "killing a sentence by removing all parses of a token during intersection. After a description of our approach we present preliminary results from tagging the Wall Street Journal Corpus with this approach. With about 400 statistically derived constraints and about 570 manual constraints, we can attain an accuracy of 97.82% on the training corpus and 97.29% on the test corpus. We then describe a finite state implementation of our approach and discuss various related issues

CiteSeerX

Crossref

Tagging English by Path Voting Constraints

Author: Gökhan Tür
Kemal Oflazer
Publication venue
Publication date: 01/01/1998
Field of study

We describe a constraint-based tagging approach where individual constraint rules vote on sequences of matching tokens and tags. Disambiguation of all tokens in a sentence is performed at the very end by selecting tags that appear on the path that receives highest vote. This constraint application paradigm makes the outcome of the disambiguation independent of the rule sequence, and hence relieves the rule developer from worrying about potentially conflicting rule sequencing found in other systems. The approach can also combine statistically and manually obtained constraints, and incorporate negative constraint rules that rule out certain patterns. We have applied this approach to tagging English text from the Wall Street Journal and the Brown Corpora. Our results from the Wall Street Journal Corpus indicate that with 400 statistically derived constraint rules and about 657 hand-crafted constraint rules, we can attain an average accuracy of 97.56% on the training corpus and an average accuracy of 97.12% on the testing corpus with 11-fold cross-validation. We can also relax the single tag per token limitation and allow ambiguous tagging which lets us trade recall and precision

CiteSeerX

Crossref

Name Tagging Using Lexical, Contextual, and Morphological Information

Author: Dilek Z. Hakkani-tür
Gökhan Tür
Kemal Oflazer
Publication venue
Publication date
Field of study

This paper presents a probabilistic model for automatically tagging names in a Turkish text. We used four different information sources to model names, and successfully combined them. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained a significant improvement. After this, we modeled the morphological analyses of the words, and finally, we modeled the tag sequence, and reached an F-measure of 91.56% in Turkish name tagging. Our results are important in the sense that, using linguistic information, i.e. morphological analyses of the words, and a corpus large enough to train a statistical model helps this basic information extraction task

CiteSeerX

Modeling the prosody of hidden events for improved word recognition

Author: Andreas Stolcke
Dilek Hakkani-tür
Elizabeth Shriberg
Gökhan Tür
Publication venue
Publication date: 01/01/1999
Field of study

We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such as duration and pitch. To model the interaction between words and prosody we modify the language model to represent hidden events such as sentence boundaries and various forms of disfluency, and combine with it decision trees that predict such events from prosodic features. N-best rescoring experiments on the Switchboard corpus show a small but consistent reduction of word error as a result of this modeling. We conclude with a preliminary analysis of the types of errors that are corrected by the prosodically informed model. 1

CiteSeerX